{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter Tuning with Chainer\n",
    "\n",
    "[VGG](https://arxiv.org/pdf/1409.1556v6.pdf) is an architecture for deep convolution networks. In this example, we use convolutional networks to perform image classification using the CIFAR-10 dataset. CIFAR-10 consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.\n",
    "\n",
    "We'll use SageMaker's hyperparameter tuning to train multiple convolutional networks, experimenting with different hyperparameter combinations. After that, we'll find the model with the best performance, deploy it to Amazon SageMaker hosting, and then classify images using the deployed model.\n",
    "\n",
    "This notebook uses the Chainer script and estimator setup from [the \"Training with Chainer\" notebook](files/chainer_single_machine_cifar10.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Setup\n",
    "from sagemaker import get_execution_role\n",
    "import sagemaker\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "\n",
    "# This role retrieves the SageMaker-compatible role used by this notebook instance.\n",
    "role = get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading training and test data\n",
    "\n",
    "We use helper functions provided by `chainer` to download and preprocess the CIFAR10 data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import chainer\n",
    "\n",
    "from chainer.datasets import get_cifar10\n",
    "\n",
    "train, test = get_cifar10()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uploading the data\n",
    "\n",
    "We save the preprocessed data to the local filesystem, and then use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value `inputs` identifies the S3 location, which we will use when we start the Training Job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import shutil\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "train_data = [element[0] for element in train]\n",
    "train_labels = [element[1] for element in train]\n",
    "\n",
    "test_data = [element[0] for element in test]\n",
    "test_labels = [element[1] for element in test]\n",
    "\n",
    "\n",
    "try:\n",
    "    os.makedirs('/tmp/data/train_cifar')\n",
    "    os.makedirs('/tmp/data/test_cifar')\n",
    "    np.savez('/tmp/data/train_cifar/train.npz', data=train_data, labels=train_labels)\n",
    "    np.savez('/tmp/data/test_cifar/test.npz', data=test_data, labels=test_labels)\n",
    "    train_input = sagemaker_session.upload_data(\n",
    "                      path=os.path.join('/tmp', 'data', 'train_cifar'),\n",
    "                      key_prefix='notebook/chainer_cifar/train')\n",
    "    test_input = sagemaker_session.upload_data(\n",
    "                      path=os.path.join('/tmp', 'data', 'test_cifar'),\n",
    "                      key_prefix='notebook/chainer_cifar/test')\n",
    "finally:\n",
    "    shutil.rmtree('/tmp/data')\n",
    "print('training data at %s' % train_input)\n",
    "print('test data at %s' % test_input)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing the Chainer script\n",
    "\n",
    "We use a single script to train and host a Chainer model. The training part is similar to a script you might run outside of SageMaker.\n",
    "\n",
    "The hosting part requires implementing certain functions. Here, we've defined only `model_fn()`, which loads model artifacts that were created during training. The other functions will take on default values as described [here](https://github.com/aws/sagemaker-python-sdk#model-serving).\n",
    "\n",
    "For a more in-depth discussion of this script see [the \"Training with Chainer\" notebook](files/chainer_single_machine_cifar10.ipynb).\n",
    "\n",
    "For more on writing Chainer scripts to run on SageMaker, or for more on the Chainer container itself, please see the following repositories: \n",
    "\n",
    "* For writing Chainer scripts to run on SageMaker: https://github.com/aws/sagemaker-python-sdk\n",
    "* For more on the Chainer container and default hosting functions: https://github.com/aws/sagemaker-chainer-containers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "!pygmentize 'src/chainer_cifar_vgg_single_machine.py'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running hyperparameter tuning jobs on SageMaker\n",
    "\n",
    "To specify options for a training job using Chainer, we construct a `Chainer` estimator using the [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk). We pass in an `entry_point`, the name of a script that contains a couple of functions with certain signatures (`train()` and `model_fn()`), and a `source_dir`, a directory containing all code to run inside the Chainer container. This script will be run on SageMaker in a container that invokes these functions to train and load Chainer models.\n",
    "\n",
    "For this example, we're specifying the number of epochs to be 1 for the purposes of demonstration. We suggest at least 50 epochs for a more meaningful result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from sagemaker.chainer.estimator import Chainer\n",
    "\n",
    "chainer_estimator = Chainer(entry_point='chainer_cifar_vgg_single_machine.py',\n",
    "                            source_dir=\"src\",\n",
    "                            role=role,\n",
    "                            sagemaker_session=sagemaker_session,\n",
    "                            train_instance_count=1,\n",
    "                            train_instance_type='ml.p2.xlarge',\n",
    "                            hyperparameters={'epochs': 1, 'batch-size': 64})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then need to pass this estimator to a `HyperparameterTuner`. For the `HyperparameterTuner` class, we define the following options for running hyperparameter tuning jobs:\n",
    "* `hyperparameter_ranges`: the hyperparameters we'd like to tune and their possible values. We have three different types of hyperparameters that can be tuned: categorical, continuous, and integer.\n",
    "* `objective_metric_name`: the objective metric we'd like to tune.\n",
    "* `metric_definitions`: the name of the objective metric as well as the regular expression (regex) used to extract the metric from the CloudWatch logs of each training job.\n",
    "* `max_jobs`: number of training jobs to run in total.\n",
    "* `max_parallel_jobs`: number of training jobs to run simultaneously."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, we are going to tune on learning rate. In general, if possible, it's best to specify a value as the least restrictive type, so we define learning rate as a continuous parameter ranging between 0.5 and 0.6 rather than, say, a categorical parameter with possible values of 0.5, 0.55, and 0.6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tuner import ContinuousParameter\n",
    "\n",
    "hyperparameter_ranges = {'learning-rate': ContinuousParameter(0.05, 0.06)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we define our objective metric, which we use to evaluate each training job. This consists of a name and a regex. The training script in this example uses Chainer's [`PrintReport`](https://docs.chainer.org/en/stable/reference/generated/chainer.training.extensions.PrintReport.html) to print out metrics for each epoch, which looks something like this when run for 50 epochs (truncated here):\n",
    "\n",
    "```\n",
    "epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time\n",
    "#033[J1           2.33857     1.86438               0.175811       0.254479                  47.5526\n",
    "#033[J2           1.78559     1.59937               0.298095       0.376493                  79.5099\n",
    "#033[J3           1.50956     1.38693               0.422015       0.469646                  111.372\n",
    "...\n",
    "#033[J48          0.378797    0.573417              0.879842       0.821955                  1548.58\n",
    "#033[J49          0.373226    0.573498              0.879516       0.812201                  1580.56\n",
    "#033[J50          0.369154    0.485158              0.882242       0.843451                  1612.49\n",
    "```\n",
    "\n",
    "The regex we use captures the fourth number in the last row, which is the validation accuracy for the final epoch in the training job. Because we're using only one epoch for demonstration purposes, our regex has 'J1' in it, but the '1' should be replaced with the number of epochs used for each training job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "objective_metric_name = 'Validation-accuracy'\n",
    "metric_definitions = [{'Name': 'Validation-accuracy', 'Regex': '\\[J1\\s+\\d\\.\\d+\\s+\\d\\.\\d+\\s+\\d\\.\\d+\\s+(\\d\\.\\d+)'}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we need to define how many training jobs to run. We recommend you set the parallel jobs value to less than 10% of the total number of training jobs, but we are setting it higher here to keep this example short. We are also setting `max_jobs` to a low value to shorten the time needed for the hyperparameter tuning job to complete, but note that running only two jobs won't demonstrate any meaningful hyperparameter tuning results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_jobs = 2\n",
    "max_parallel_jobs = 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tuner import HyperparameterTuner\n",
    "\n",
    "chainer_tuner = HyperparameterTuner(estimator=chainer_estimator,\n",
    "                                    objective_metric_name=objective_metric_name,\n",
    "                                    hyperparameter_ranges=hyperparameter_ranges,\n",
    "                                    metric_definitions=metric_definitions,\n",
    "                                    max_jobs=max_jobs,\n",
    "                                    max_parallel_jobs=max_parallel_jobs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our tuner, we can now invoke `fit()` to start a hyperparameter tuning job:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chainer_tuner.fit({'train': train_input, 'test': test_input})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Waiting for the hyperparameter tuning job to complete\n",
    "\n",
    "Now we wait for the hyperparameter tuning job to complete. We have a convenience method, `wait()`, that will block until the hyperparameter tuning job has completed. We can call that here to see if the hyperparameter tuning job is still running; the cell will finish running when the hyperparameter tuning job has completed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chainer_tuner.wait()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploying the Trained Model\n",
    "\n",
    "After training, we use the tuner object to create and deploy a hosted prediction endpoint with the best training job. We can use a CPU-based instance for inference (in this case an `ml.m4.xlarge`), even though we trained on GPU instances.\n",
    "\n",
    "The predictor object returned by `deploy()` lets us call the new endpoint and perform inference on our sample images using the model from the best training job found during hyperparameter tuning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "predictor = chainer_tuner.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CIFAR10 sample images\n",
    "\n",
    "We'll use these CIFAR10 sample images to test the service:\n",
    "\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/airplane1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/automobile1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/bird1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/cat1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/deer1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/dog1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/frog1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/horse1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/ship1.png\" />\n",
    "<img style=\"display: inline; height: 32px; margin: 0.25em\" src=\"images/truck1.png\" />\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predicting using SageMaker Endpoint\n",
    "\n",
    "We batch the images together into a single NumPy array to obtain multiple inferences with a single prediction request."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from skimage import io\n",
    "import numpy as np\n",
    "\n",
    "def read_image(filename):\n",
    "    img = io.imread(filename)\n",
    "    img = np.array(img).transpose(2, 0, 1)\n",
    "    img = np.expand_dims(img, axis=0)\n",
    "    img = img.astype(np.float32)\n",
    "    img *= 1. / 255.\n",
    "    img = img.reshape(3, 32, 32)\n",
    "    return img\n",
    "\n",
    "\n",
    "def read_images(filenames):\n",
    "    return np.array([read_image(f) for f in filenames])\n",
    "\n",
    "filenames = ['images/airplane1.png',\n",
    "             'images/automobile1.png',\n",
    "             'images/bird1.png',\n",
    "             'images/cat1.png',\n",
    "             'images/deer1.png',\n",
    "             'images/dog1.png',\n",
    "             'images/frog1.png',\n",
    "             'images/horse1.png',\n",
    "             'images/ship1.png',\n",
    "             'images/truck1.png']\n",
    "\n",
    "image_data = read_images(filenames)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The predictor runs inference on our input data and returns a list of predictions whose argmax gives the predicted label of the input data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "response = predictor.predict(image_data)\n",
    "\n",
    "for i, prediction in enumerate(response):\n",
    "    print('image {}: prediction: {}'.format(i, prediction.argmax(axis=0)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup\n",
    "\n",
    "After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "chainer_tuner.delete_endpoint()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  },
  "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
